Add support for FixedSizeList to variant_to_arrow#9663
Add support for FixedSizeList to variant_to_arrow#9663rishvin wants to merge 11 commits intoapache:mainfrom
Conversation
|
@sdf-jkl could you help review this PR? |
|
Thanks @sdf-jkl for reviewing changes. I have addressed the comments, could you re-review ? |
| } | ||
|
|
||
| impl<'a> ListElementBuilder<'a> { | ||
| fn append_null(&mut self) -> Result<()> { |
There was a problem hiding this comment.
Maybe add a comment mentioning this is only used for FixedSizeList builder?
Co-authored-by: Konstantin Tarasov <33369833+sdf-jkl@users.noreply.github.com>
|
Thanks @sdf-jkl addressed new comments. |
Co-authored-by: Konstantin Tarasov <33369833+sdf-jkl@users.noreply.github.com>
|
Thanks again @sdf-jkl ! Addressed it. |
| // With `safe` cast option set to false, appending list of wrong size to | ||
| // `typed_value_builder` of type `FixedSizeList` will result in an error. In such a | ||
| // case, the provided list should be appended to the `value_builder. |
There was a problem hiding this comment.
I'm not sure the variant shredding spec allows for shredding as a fixed size list, if the resulting layout differs physically from a normal list?
Arrays can be shredded by using a 3-level Parquet list for
typed_value.If the value is not an array,
typed_valuemust be null. If the value is an array,valuemust be null.
It looks to me like any attempt to shred as fixed-sized list must either succeed (if the size is correct) or hard-fail (because value as fallback is not allowed).
There was a problem hiding this comment.
It differs physically on the Arrow side, but once we write it to Parquet it'd be same as other ListLikeArrays. But this leads to further discussion on adding FixedSizeList support for VariantArray as well as implementing other types, currently not supported in spec.
We're keeping value because we consider this a cast from Variant to FixedSizeList. The extra len check is there because there is no Variant::FixedSizeList enum to match to. If len is incorrect we consider the cast failed and proceed following the safe cast option as if typed_value is Null.
There was a problem hiding this comment.
When casting from variant to arrow, we can do whatever we want.
But this code here is about going from binary variant to shredded variant. And the variant shredding spec directly forbids value to contain a variant array, when shredding as array.
There was a problem hiding this comment.
True. I think the core issue is that Parquet currently has only one logical LIST type. If Parquet had a dedicated logical type for FixedSizeList, the spec wording could be more explicit.
Btw, there’s ongoing work on this too: apache/parquet-format#241 (recently revived).
Given the current spec text:
Arrays can be shredded by using a 3-level Parquet list for typed_value.
If the value is not an array, typed_value must be null. If the value is an array, value must be null.
I read “array” as "a value matching the specific list shape we’re shredding into". For List/LargeList/ListView it's List values, for FixedSizeList array it's a FixedSizeList value.
There was a problem hiding this comment.
From what I understand, the variant spec neither knows nor cares about the intricacies of arrow array types (it also doesn't care about spark or SQL). If we're shredding to a 3-level parquet list, and we encounter a variant array value, the resulting value column entry must be null.
There was a problem hiding this comment.
Hi, I worked on the shredding spec, and the intent of that line of the spec was to apply to any array, not just one that perfectly matches the shredding schema. For example, in a query with try_cast(v as array<variant>), an engine would be entitled to only fetch the typed_value column from parquet, and produce null for all of the rows where typed_value is null. This would break if value could contain arrays.
variant_to_arrowFixedSizeListtype support #9531Rationale for this change
Add support for
FixedSizeListwhen invokingvariant_to_arrow.What changes are included in this PR?
VariantToFixedSizeListArrowRowBuilder.FixedSizeList.Are these changes tested?
By adding few test cases.
Are there any user-facing changes?
N/A.